-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of the InitialDataAdder abstraction #383
Addition of the InitialDataAdder abstraction #383
Conversation
Creating abstract InitialDataAdder type for use with creating datasets, and modifying Dataset.create abstract function to use this type for the 'initial_data' parameter, rather than a string that represents the location of some kind of initial data.
Updating DatasetManager implementation's create function to support usage of the InitialDataAdder abstraction for the 'initial_data' param.
Updating to depend on core 0.9.0.
At first glance, it appears that this is adding responsibility to dataset creation that creation should not be responsible for. Do you think could handle this use case if we instead introduced a dataset transaction or workflow abstraction? For starters we could limit the functionality to only operate on a single dataset. It seems easier to model data transformation pipelines using a workflow (weaker guarantees than transaction) or transaction. |
My perspective is actually the opposite. This is creating an abstract type dedicated to obtaining and adding initial data to a dataset. Regarding whether dataset creation should be tied to adding data ...
A transactional approach is essentially the intent here. If/when there is some expected, initial data for a dataset, but it cannot be added, then DatasetManager.create() should (effectively) not create the dataset. For example, don't create a composite config dataset at all - even if we can/did add the realization config file to it just fine - if there is a failure copying files from the standalone BMI init config dataset we are supposed to copy from. Of course, the manager must create that dataset to have something to add to. My solution was to ensure initial data was added inside create() by developing an abstraction somewhat similar to a Visitor, passed to and used by create(), that would encapsulate the logic for adding any initial data. That way, if there is trouble, create() can catch any exceptions and clean up the bad dataset. This may make more sense after seeing some of the initial implementations. I just didn't include those here because they were in the dmod.dataservice package and not needed for this part. |
Right, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a minor change is needed to clean up a bucket's objects before removing the bucket in the case of a failure.
# manager and the object store itself | ||
except Exception as e: | ||
self.datasets.pop(name) | ||
self._client.remove_bucket(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to remove any files that were possibly written to the bucket (i.e. if add_initial_data()
successfully added a portion of its data). remove_bucket
only removes empty buckets.
Fixing ObjectStoreDatasetManager create() function so that it calls the delete() function if initial data fails to be added (and thus the dataset does not need to end up created); previous logic that just removed the bucket would fail if the bucket had anything in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks, @robertbartel!
A portion of a larger set of changes involved in work with moving to composite configuration dataset types. Breaking this off parts like this that are fairly self contained for easier review.
Additions
Changes
create
function to use InitialDataAdder instead of a str for itsinitial_data
paramcreate
function comply with interface changes and use theinitial_data
object as a InitialDataAdder appropriately.